Goto

Collaborating Authors

 belief-dependent macro-action discovery


Belief-Dependent Macro-Action Discovery in POMDPs using the Value of Information

Neural Information Processing Systems

This work introduces macro-action discovery using value-of-information (VoI) for robust and efficient planning in partially observable Markov decision processes (POMDPs). POMDPs are a powerful framework for planning under uncertainty. Previous approaches have used high-level macro-actions within POMDP policies to reduce planning complexity. However, macro-action design is often heuristic and rarely comes with performance guarantees. Here, we present a method for extracting belief-dependent, variable-length macro-actions directly from a low-level POMDP model. We construct macro-actions by chaining sequences of open-loop actions together when the task-specific value of information (VoI) --- the change in expected task performance caused by observations in the current planning iteration --- is low. Importantly, we provide performance guarantees on the resulting VoI macro-action policies in the form of bounded regret relative to the optimal policy. In simulated tracking experiments, we achieve higher reward than both closed-loop and hand-coded macro-action baselines, selectively using VoI macro-actions to reduce planning complexity while maintaining near-optimal task performance.


Review for NeurIPS paper: Belief-Dependent Macro-Action Discovery in POMDPs using the Value of Information

Neural Information Processing Systems

Weaknesses: The work is not well presented. Terms like open-loop actions, closed-loop policies, and reachable belief space were used without definitions provided. As a result, the reviewer had difficulties understanding Figures 1 and 2. Value of information is the key of this work, but was only briefly discussed in Section 4.1. The major concern is on the evaluation of the developed methods. The POMDP community has provided a number of benchmark problems.


Review for NeurIPS paper: Belief-Dependent Macro-Action Discovery in POMDPs using the Value of Information

Neural Information Processing Systems

The authors did a good jump of addressing reviewer concerns in the response. There were some lingering concerns about whether the authors had picked the best compare-to choices for their experiments. Additional experiments and/or more careful justification for the choices made would always help. I would recommend that the authors take the reviewers' comments into account in preparing the final version of the paper.


Belief-Dependent Macro-Action Discovery in POMDPs using the Value of Information

Neural Information Processing Systems

This work introduces macro-action discovery using value-of-information (VoI) for robust and efficient planning in partially observable Markov decision processes (POMDPs). POMDPs are a powerful framework for planning under uncertainty. Previous approaches have used high-level macro-actions within POMDP policies to reduce planning complexity. However, macro-action design is often heuristic and rarely comes with performance guarantees. Here, we present a method for extracting belief-dependent, variable-length macro-actions directly from a low-level POMDP model. We construct macro-actions by chaining sequences of open-loop actions together when the task-specific value of information (VoI) --- the change in expected task performance caused by observations in the current planning iteration --- is low.